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Confidence distribution (CD) — 
distribution estimator of a parameter 

Kesar SingliHil, Minge Xicffll] and William E. Strawderman ^Mil 

Rutgers University 

Abstract: The notion of confidence distribution (CD), an entirely frequentist 
concept, is in essence a Neymanian interpretation of Fisher's Fiducial distri- 
bution. It contains information related to every kind of frequentist inference. 
In this article, a CD is viewed as a distribution estimator of a parameter. This 
leads naturally to consideration of the information contained in CD, com- 
parison of CDs and optimal CDs, and connection of the CD concept to the 
(profile) likelihood function. A formal development of a multiparameter CD is 
also presented. 



1. Introduction and the concept 

We are happy to dedicate this article to the memory of our colleague Yehuda Vardi. 
He was supportive of our efforts to develop this research area and in particular 
brought his paper with Colin Mallows (Mallows and Vardi [l^) to our attention 
during the discussion. A confidence-distribution (CD) is a compact expression of 
frequentist inference which contains information on just about every kind of infer- 
ential problem. The concept of a CD has its roots in Fisher's fiducial distribution, 
although it is a purely frequentist concept with a purely frequentist interpretation. 
Simply speaking, a CD of a univariate parameter is a data-dependent distribution 
whose s-th quantile is the upper end of a 100s%-level one-sided confidence interval 
of 6. This assertion clearly entails that, for any < s < i < 1, the interval formed 
by s-th and t-th quantiles of a CD is a 100{t — s)% level two-sided confidence inter- 
val. Thus, a CD is in fact Neymanian interpretation of Fisher's fiducial distribution 
(Neyman (llj). The concept of CD has appeared in a number of research articles. 
However, the modern statistical community has largely ignored the notion, particu- 
larly in applications. We suspect two probable causes lie behind this: (I) The first is 
its historic connection to Fisher's fiducial distribution, which is largely considered 
as "Fisher's biggest blunder" (see, for instance, Efron '8]); (H) Statisticians have 
not seriously looked at the possible utility of CDs in the context of modern statis- 
tical practice. As pointed out by Schweder and Hjort [l^l, there has recently been 
a renewed interest in this topic. Some recent articles include Efron 0, @|, Fraser 



T3l . Lehmann [l^, Schweder and Hjort 2^ 3], Singh, Xie and Strawderman 
26|, among others. In particular, recent articles emphasize the Neymanian in- 
terpretation of the CD and present it as a valuable statistical tool for inference. 
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For example, Schweder and Hjort [23] proposed reduced likelihood function from 
the CDs for inference, and Singh, Xie and Strawderman [2^ developed attractive 
comprehensive approaches through the CDs to combining information from inde- 
pendent sources. The following quotation from Efron on Fisher's contribution 
of the Fiducial distribution seems quite relevant in the context of CDs: ". . . but 
here is a safe prediction for the 21st century: statisticians will be asked to solve 
bigger and more complicated problems. I believe there is a good chance that objec- 
tive Bayes methods will be developed for such problems, and that something like 
fiducial inference will play an important role in this development. Maybe Fisher's 
biggest blunder will become a big hit in the 21st century!" 

In the remainder of this section, we give a formal definition of a confidence distri- 
bution and the associated notion of an asymptotic confidence distribution (aCD), 
provide a simple method of constructing CDs as well as several examples of CDs 
and aCDs. In the following formal definition of the CDs, nuisance parameters are 
suppressed for notational convenience. It is taken from Singh, Xie and Strawderman 
(25I [2^. The CD definition is essentially the same as in Schweder and Hjort [25]; 
they did not define the asymptotic CD however. 

Definition 1.1. A function Hn{-) = i?„(X„,-) on X x Q [0,1] is called a 
confidence distribution (CD) for a parameter 9, if (i) For each given sample set X„ in 
the sample set space X, Hn{-) is a continuous cumulative distribution function in the 
parameter space 0; (ii) At the true parameter value 9 = 9o, Hn{9o) = -ff„(X„,0o): 
as a function of the sample set X„, has a uniform distribution J7(0, 1). 

The function H„(-) is called an asymptotic confidence distribution (aCD), if 

requirement (ii) above is replaced by (ii)': At 9 = Oq, Hn(Oo) C/(0, 1), as n — > 
-|-oo, and the continuity requirement on Hn{-) is dropped. 

We call, when it exists, hn{6) = H^^{9) a CD density. It is also known as confi- 
dence density in the literature. It follows from the definition of CD that ii 6 < 60, 

sto sto sto 

Hn{9) < 1 - Hn{9), and ii 9 > 9o, 1 - Hn{e)<Hn{6). Here, < is a stochastic com- 
parison between two random variables; i.e., for two random variable Yi and ^2, 

sto 

Yi < Y2, if P(Yi < t) > P{Y2 < t) for all t. Thus a CD works, in a sense, hke 
a compass needle. It points towards ^o, when placed at ^ 0o, by assigning more 
mass stochastically to that side (left or right) of 9 that contains 9q. When placed 
at 9q itself, Hn{9) = Hn{9o) has the uniform U[0,1] distribution and thus it is 
noninformative in direction. 

The interpretation of a CD as a distribution estimator is as follows. The purpose 
of analyzing sample data is to gather knowledge about the population from which 
the sample came. The unknown is a characteristic of the population. Though 
useful, the knowledge acquired from the data analysis is imperfect in the sense 
that there is still a, usually known, degree of uncertainty remaining. Statisticians 
can present the acquired knowledge on 9, with the left-over uncertainty, in the 
form of a probability distribution. This appropriately calibrated distribution, that 
reflects statisticians' confldence regarding where 9 lives, is a CD. Thus, a CD is an 
expression of inference (an inferential output) and not a distribution on 9. What is 
really fascinating is that a CD is loaded with a wealth of information about 9 (as 
it is detailed later), as is a posterior distribution in Bayesian inference. 

Before we give some illustrative examples, let us describe a general substitution 
scheme for the construction of CDs, that avoids inversion of functions; See, also 
Schweder and Hjort [l^]. Although this scheme does not cover all possible ways of 
constructing CDs (see, for example. Section 4), it covers a wide range of examples 
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involving pivotal statistics. 

Consider a statistical function ■i/;(X„,0) which involves the data set X„ and 
the parameter of interest 9. Besides 0, the function may contain some known 
parameters (which should be treated as constants) but it should not have any other 
unknown parameter. On i/j, we impose the following condition: 

• For any given X„, 'ip{X.n^9) is continuous and monotonic as a function of 9. 

Suppose further that G„, the true c.d.f. of ip(X.n,d), does not involve any un- 
known parameter and it is analytically tractable. In such a case, ?/'(X„, 9) is gener- 
ally known as a pivot. Then one has the following exact CD for 9 (provided G„(-) 
is continuous): 



Hn{x) 



G„(?/'(X„, x)), if is increasing in 9 
1 — G„(V'(X„,a;)), if is decreasing in 9. 



In most cases, Hn{x) is typically a continuous c.d.f. for fixed X„ and, as a function 
of X„, Hn{9o) follows a L/[0, 1] distribution. Thus, Hn is a CD by definition. Note 
the substitution of 6 by x. 

In case the sampling distribution G„ is unavailable, including the case in which 
G„ depends on unknown nuisance parameters, one can turn to an approximate or 
estimated sampling distribution G^. This could be the limit of G„, an estimate of 
the limit or an estimate based on bootstrap or some other method. Utilizing G„, 
one defines 



Hn 



G„(^(X„,x)), if ^ is increasing in 0, 
1 — G„('(/'(X„, x)), if ijj is decreasing in 6. 



In most cases, i?„(6'o)— ^?7[0, 1] and is thus an asymptotic CD. The above con- 
struction resembles Beran's construction of prcpivot (see Beran page 459), which 
was defined sA 9 — 9^ (the true value of 9). Beran's goal was to achieve second order 
accuracy in general via double bootstrap. 

We now present some illustrative examples of CDs. 

Example 1.1. {Normal mean and variance) The most basic case is that of sampling 
from a normal distribution with parameters ^, and . Consider first a CD for 
the mean when the variance is unknown. Here the standard pivot is ?/'(X„,/i) — 
{Xn — fi)/{sn/\/n), which has the student i-distribution with (n— 1) d.f. Using the 
above substitution, the CD for /i is 

Hnix) = 1 - < = pfr„_i < 

V Sn/^/nJ \ Sn/\/nJ 

where T„_i is a random variable that has the Student's t„_i- distribution. 

For tr^, the usual pivot is a^) — (n— l)s^/a^. By the substitution method, 

the CD for is 

(1.1) g«(x)^p(xLi> ^"~^^^'" )^ ^>o- 

where Xn-i ^ random variable that has the Chi-square distribution with n ~ 1 
degrees of freedom. 
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Example 1.2. (Bivariate normal correlation) For a bivariate normal population, 
let p denote the correlation coefficient. The asymptotic pivot used in this example 
is Fisher's Z, V(X„,p) = [log((l + r)/(l - r)') - log (1 + - p)] /2 where r is 
the sample correlation. Its limiting distribution is iV(0, -;;^), with a fast rate of 
convergence. So the resulting asymptotic CD is 

1 . l + r 1 . 1 



F„(x) - 1 - $ (V^^iy^ log - 2 log T^j) ' "1 - ^' - 1- 

Example 1.3. (Nonparametric bootstrap) Turning to nonparametric examples ba- 
sed on bootstrap, let be an estimator of 9, such that the limiting distribution of 
0, properly normalized, is symmetric. Using symmetry, if the sampling distribution 
of 6* — 6* is estimated by the bootstrap distribution oi 9 — Ob, then an asymptotic 
CD is given by 

Hnix) = 1 - Peie -§B<e-x)= Pb0b < x). 

Here, 9b 6 computed on a bootstrap sample. The resulting asymptotic CD is the 
raw bootstrap distribution of 9. 

If the distribution of 6* — 6* is estimated by the bootstrap distribution of 9b — 9, 
which is what bootstrappers usually do, the corresponding asymptotic CD is 

H„{x) = 1 - Pb{9b -9<9-x) = Pb{9b>29- x). 

Example 1.4. [Bootstrap-t method) By the bootstrap-t method, the distribution 
of asymptotic pivot {9 — 9)/SE{9) is estimated by the bootstrap distribution of 
{9b — 9)/ SEb{9b)- Here SEb{9b) is the estimated standard error of 9b: based on 
the bootstrap sample. Such an approximation has so-called, second order accuracy 



pproxi 

(see Singh [24}], Babu and Singh 'z, The resulting asymptotic CD would be 

H.ix)^l-PB(J^<h^\ 
\SEb{9b) SE{9)J 

Such a CD, at X = 6'o, typically converges to [/[0, 1], in law, at a rapid pace. 

Example 1.5. {Bootstrap 3rd order accurate aCD) Hall came up with the 
following increasing function of the i-statistics, which does not have the l/y^-term 
in its Edgeworth expansion: 

^(x„, ^i) = t + -^{2t^ + i) + T^xh^ 

Here t = ^^{'X. — n)/sn, ^ — A is a sample estimate of A and the assump- 
tion of population normality is dropped. Under mild conditions on the population 
distribution, the bootstrap approximation to the distribution of this function of t, 
is third-order correct. Let Gb be the c.d.f. of the bootstrap approximation. Then, 
using the substitution, a second-order correct CD for /i is given by 

i/„(2;) = 1-Gb(^(X„,x)). 

One also has CDs that do not involve pivotal statistics. A particular class of 
such CDs are constructed from likelihood functions. We will have some detailed 
discussions on the connections of CDs and likelihood functions in Section 4. 

For each given sample X„, Hn{-) is a cumulative distribution function. We can 
construct a random variable ^ such that ^ has the distribution i/„. For convenience 
of presentations, we call ^ a CD random variable. 
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Definition 1.2. We call ^ — £,h„ a CD random variable associated with a CD i?„, 
if the conditional distribution of ^ given the data X„ is 

As an example, let J7 be a C/(0, 1) random variable independent of X„, then 
^ — H~^{U) is a CD random variable. 

Let us note that ^ may be viewed as a CD-randomized estimator of 9q. As an 
estimator, ^ is median unbiased, i.e., -Peo(C < ^o) — Eg„{Hn{9o)} — i. However, ^ 
is not always mean unbiased. For example, the CD random variable ^ associated 
with (1.1) in Example 1.1 is mean biased as an estimator of a^. 

We close this section with a equivariance result on CDs, which may be helpful in 
the construction of a CD for a function of 6. For example, to derive a CD for a from 
that of given in Example 1.1. The equivariance is shared by Efron's bootstrap 
distribution of an estimator, which is of course aCD (Example 1.3 above) under 
conditions. 

Proposition 1.1. Let Hn be a CD for 9 and^ he an associated CD random variable. 
Then, the conditional distribution function of g{£,), for given X„, is a CD of g{9), 
if g is monotonic. When the monotonicity is limited to a neighborhood of 9o only, 
then the conditional distribution of g{(,), for given X„, yields an asymptotic CD at 
9 = 9o, provided, for all e > 0, Hn{9o + e) — Hn{9o — 

Proof. The proof of the first claim is straightforward. For the second claim, we 
note that, if g{-) is increasing within (6*0 — e,9o + e), P{g{S,) < g(0o)|x) — P{{£, < 
9o}n{9o~e<C<9o + e}\ x) + Op(l) = i/„(6'o) + Op(l). One argues similarly for 
decreasing g{-). □ 

The rest of the paper is arranged as follows. Section 2 is devoted to comparing 
CDs for the same parameter and related issues. In Section 3, we explore, from the 
frequentist viewpoint, inferential information contained within a CD. In Section 4, 
we establish that the normalized profile likelihood function is an aCD. Lastly, Sec- 
tion 5 is an attempt to formally define and develop the notion of joint CD for a 
parameter vector. Parts of Sections 2 and 3 are closely related to the recent paper of 
Schweder and Hjort [22,], and also to Singh, Xie and Strawderman ^,i2|i]. Schweder 



and Hjort (22| present essentially the same definition of the CD and also compare 
CDs as we do in this paper (See Definition 2.1). They also develop the notion of 
an optimal CD which is quite close to that presented here and in Singh, Xie and 
Strawderman [2^. Our development is based on the theory of UMPU tests and 
differs slightly from theirs. The materials on p-values in Section 3.3 is also closely 
related to, but somewhat more general than, that of Eraser [Hi] . 



2. Comparison of CDs and a notion of optimal CD 

The precision of a CD can be measured in terms of how little probability mass a 
CD wastes on sets that do not include 9o. This suggests that, for e > 0, one should 
compare the quantities Hi{9q — e) with -^2(^0 — e) and also 1 — Hi{9q + e) and 
1 — H2{9o + e). In each case, a smaller value is preferred. Here Hi and H2 are any 
two CDs for the common parameter 9, based on the same sample of size n. 

Definition 2.1. Given two CDs Hi and H2 for 9, we say Hi is more precise than 
H2, at 9 = 9o, if for all e > 0, 

sto sto 

Hi{9o-e)<H2{9o-e) and 1 - Fi(0o + e) < 1 - ^2(^0 + e) 
when ^0 is the prevailing value of 9. 
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An essentially equivalent definition is also used in Singh, Xic and Strawderman 
[25I and Schweder and Hjort The following proposition follows immediately 
from the definition. 

Proposition 2.1. If Hi is more precise than H2 and they both are strictly increas- 
ing, then for all t in [0, 1] , 

[H^\t) - ^o] < [H^'it) - 9o] + and [H^\t) - ^o] < [H^\t) ~ 60] 

Thus, 

(2.1) |Hfi(t)-0o|<|ff2"'W-^o|- 

The statement (2.1) yields a comparison of confidence intervals based on Hi and 
H2- In general, an endpoint of a confidence interval based on Hi is closer to Oq than 
that based on H2- In some sense, it implies that the Hi based confidence intervals 
are more compact. Also, the CD median of a more precise CD is stochastically 
closer to ^o- 

Let 4>{^j(^) be a loss function such that </>(•,•) is non-decreasing for x > 9 and 
non-increasing for x < 9. We now connect the above defined CD-comparison to the 
following concept of the 0-dispersion of a CD. 

Definition 2.2. For a CD H{x) of a parameter 9, the ^-dispersion of H{x) is 
defined as 



d49,H)^Eg J <j,{x,9)dHi3 



In the special case of square error loss, dsq{9, H) ~ Eg J {x—9)'^dH{x). In general, 
we have the following: 

Theorem 2.1. If Hi is more precise than H2 at 9 = 9q, in terms of Definition 
2.1, then 

(2.2) d49o,Hi)<d49o,H2). 

In fact, the above theorem holds under a set of weaker conditions: For any e > 0, 

(2.3) E{Hii9o - e)} < E{H2{9n - e)} iindE{Hi{9o + e)} > E{H2{9o + e)} 
Proof. The claim in (2.2) is equivalent to 

(2.4) E{(l){^i,9o)}<E{^i^2,Oo)}, 

where f 1 and ^2 are CD random variables associated with Hi and H2 (see, Definition 
1.2), respectively. From (2.3), via conditioning on X„, it follows that 

sto sto 

(6 - Oo)+ < (6 - f?o)+ and (Ci - 9o)- < (6 - ^o)". 
Due to the monotonicity of ^o), we have 

sto sto 

'/•(Ci,^o)%i>eo)<</'(6,^o)%>eo) and (?!)(Ci, 6*0)%! <eo)< 0(6,^0)% <eo)- 
The above inequalities lead to (2.4) immediately. □ 
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Suppose now that there is a family of Uniformly Most Powerful Unbiased tests 
for testing Kq : < Oq versus Ki : 6 > 60, ioi every 6*0- The underlying family of 
distributions may have nuisance parameter(s). Let the corresponding p- value (the 
inf of a at which Kq can be rejected) p{Oq) = p(X„, ^o) be strictly increasing and 
continuous as a function of ^o- It is further assumed that 1 ~p{Oo) is the p- value of 
an UMPU test for testing Kq : 9 > 9q vs Ki : 9 < 9q. Let the distribution of p{9q) 
under 9q be U[0, 1] and let the range of p(-) be [0, 1]. Define the corresponding CD, 
H*{x) — p(X„, x). We have the following result. 

Theorem 2.2. The CD H* defined above is more precise than any other CD for 
the parameter 9, at all 9q. 

Proof. Let 9 — 9o he the true value. Note that Pg„ [H* {9q — e) < a) is the power 
(at 6* = 6*0) of the UMPU test when Kq\s9 < 9Q-e and Ki\s9 > 9q- e. Given any 
other CD one has the following unbiased test for testing the same hypotheses: 
Reject Kq iff H{9q ~ e) < a. Therefore, Pg,, {H*{9q - e) < a) > Pe,, {H{9q - e) < a) 
for all a G [0, 1]. Using the function 1— p(-), one similarly argues for P^o (l — -ff*(^o + 
e) < a) > P^o (1 - H{9q + e) < a). Thus, H* is most precise. □ 

It should be mentioned that the property of CDs as exhibited in Theorem 2.2 
depend on corresponding optimality properties of hypothesis tests. The basic ideas 
behind this segment could be traced to the discussions of confidence intervals in 



Lehmann [14| . 



Remark 2.1. If the underlying parametric family has the so-called MLR (monoto- 
ne likelihood ratio) property, there exists an UMP test for one-sided hypotheses 
whose p- value is monotonic. 

Example 2.1. In the testing problem of normal means, the Z-test is UMPU (ac- 
tually UMP), for the one-sided hypotheses when a is known. The t-test is UMPU 
for the one-sided hypotheses when cr is a nuisance parameter (see Lehmann 
Chapter 5). The conclusion: H*{x) = ^{ ^J^ ) is the most precise CD for /i, when 

cr is known, and H**{x) = ^t„-i(^^y^) is the most precise CD, when a is not 
known. Here, Pt„_i is the cumulative distribution function of the i-distribution with 
degrees of freedom rt — 1 . 

The above presented optimality theory can be expressed in the decision theoretic 
framework by considering the "target distribution" towards which a CD is supposed 
to converge. Given 6*0 as the true value of 9, the target distribution is 5{9q), the 
Dirac i5-measure at Sq, which assigns its 100% probability mass at 6*0 itself. A loss 
function can be defined in terms of "distance" between a CD H{-) and 5{9q). 

Perhaps, the most popular distance, between two distributions F and G, is the 
Kolmogorov-Smirnov distance = sup^. \F{x) — G{x)\. However, it turns out that this 
particular distance, between a CD and its target 5(9q)^ is useless for comparing two 
CDs. To see this, note that 



So 



snY>\H{x) -/[eo,oo)l 



= Pffo [max {H{9q), 1 - H{9q))] = 3/4 (free of H\), 



since H{9q) follows the C/[0, 1] distribution. Note, /[So.oo) is the cdf of 5{9q). So, we 
instead consider the integrated distance 

t{H,5{9q)) = f ij{\Hix)~I[e^„^)\)dWix) 
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where V'(-) is a monotonic function from [0, 1] to i?+ and W{-) is a positive measure. 
The risk function is 

Re,{H) = Ee,[T{H,S{eo))]. 
Theorem 2.3. If Hi is a more precise CD than H2 at 6 = 60, then R0g{Hi) < 

Theorem 2.3 is proved by interchanging the expectation and the integration 
appearing in the loss function, which is allowed by Fubini's Theorem. 

Now, for asymptotic CDs, we define two asymptotic notions of "more precise 
CD". One is in the Pitman sense of "local alternatives" and the other is in the 
Bahadur sense when the 9 9o is held fixed and the a.s. limit is taken on CDs 
themselves. 

First, the Pitman- more precise CD: 

Definition 2.3. Let Hi and H2 be two asymptotic CDs, We say that Hi is Pitman- 
more precise than H2 if, for every e > 0, 



and 



limPj Hm I ^0 - -^j <tj> limP(^i/2„ [eo--j=] <t 
limP( l-Hi^{eo + -^^ <t^> limP(^l -H2n(^0o + ^]<t 



where all the limits (as n 00) are assumed to exist, and the probabilities are 
under = Oq. 

Thus, we are requiring that in terms of the limiting distributions, 



and 

1 - ffi„(^0o + <1 - H2n( 00 



Example 2.2. The most basic example allowing such a comparison is that of 

where Qi , Q2 are two ^ri-consistent asymptotically normal estimators of ^, with 
asymptotic variances a^/n and respectively. The one with a smaller asymp- 
totic variance is Pitman-more precise. 

Next, the Bahadur-type comparison: 

Definition 2.4. We define Hi to be Bahadur-more precise than H2 (when Q = ^o) 
if, for every e > 0, a.s. 

lim - log i?i (^0 - e) < lim - log iJa (^0 - e) 

n n 

and 

lim - log (1 - HiiQ^ + e)) < hm - log (l - i?2(^o + e)) , 
n n 

where the limits, as n — > 00, are assumed to exist. 
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Here too, we are saying that in a limit sense, Hi places less mass than H2 does 
on half- lines which exclude ^o- This comparison is on Hi directly and not on their 
distributions. 

Example 2.3. Let us return to the example oi Z vs t for normal means. The 
fact that ^{fj-^) is Bahadur-more precise than ( /y^ ) follows from the 
well-known limits: 

-log$(-ev/^) — >-eV2 and -\ogFt^_,{-eV^) — > log ( 1 -h ) . 

Remeirk 2.2. Under modest regularity conditions, 

- limlogif„(6'o - e) = - limlog/i„(6lo - e) 
n n 

and 

- limlog[l - Hn{0o + e)] = - lim log /i„ (6*0 + e). 
n n 

The right hand sides are CD-density slopes, which have significance of their own 

for CDs. The faster the CD density goes to 0, at fixed 9 ^ 6*0, the more compact, 

in limit, the CD is. 



3. Information contained in a CD 



This section discusses inference on 6 from a CD or an aCD. We briefly consider basic 
elements of inferences about 9, including confidence intervals, point estimation, and 
hypothesis testing. 



3.1. Confidence intervals 



The derivation of confidence intervals from a CD is straightforward and well known. 
Note that, according to the definition, the intervals {—00, H^^{a)] and [H~^{a), 
+00) are one-sided confidence intervals for 9, for any a E (0. 1). It is also clear that 
the central regions of the CD, i.e., {H:^^{a/2),H~^{1 — a/2)), provide two sided 
confidence interval for 9 at each coverage level a € (0, 1). The same is true for an 
aCD, where the confidence level is achieved in limit. 



3.2. Point estimators 

A CD (or an aCD) on the real line can be a tool for point estimation as well. We 
assume the following condition, which is mild and almost always met in practice. 

(3.1) For any e and each fixed 9o, < e < ^, Ln{e) = H~^{1 — e) — H~^{e) — > 0, 
in probability, as n ^ 00. 

Condition (3.1) states the CD based information concentrates around ^0 as n 

gets large. 

One natural choice for a point estimator of the parameter 9 is the median of a CD 
(or an aCD), M„ = H~^{l/2). Note that, M„ is a median-unbiased estimator; even 
if the original estimator, on which iJ„ is based, is not. For instance, this docs happen 
in the case of the CD for the normal variance, based on the x^-distribution. This 
median unbiased result follows from observation that P0g{Mn < 9o) = P0g{l/2 < 
Hn{9o)) = 1/2. The following is a consistency result on M„. 
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Theorem 3.1. (i) If condition (3.1) is true, then Mn — » ^Oj in probability, as 
n — > oo. (a) Furthermore, if Ln{e) = Op(a„), for a non-negative a„ — » 0, then 
M„ - 6^0 = Op(a„). 

Proof, (i) We first note the identity: for a G (0, ^) 

Peo(|M„ - 9o\ > S)=Peo{{\Mr, - 9o\ > 6} n > ,5}) 

+Peo{{\M„-0o\>S}n{L„{a)<6}). 

Under the assumed condition, the first term in the r.h.s.^ 0. We prove that the 
second term is < 2a. This is deduced from the set inequality: 

{|M„ - ^ol > 4 n <6}C {Hn{eo) <a}\J {if„(^o) >!-«}• 

To conclude the above set inequality, one needs to consider the two cases: M„ > 

Oq + S and Af„ < 6o — 6, separately. The first case leads to {HniOo) < a} and the 
second one to {Hn{0o) > 1 — a}. Part (i) follows, since a is arbitrary. 

One can prove part (ii) by using similar reasoning. □ 

One can also use the average of a CD (or an aCD), 9n = /J^^ t dHn{t), to con- 
sistently estimate the unknown Oq. Indeed, 6n is the frequentist analog of Bayesian 
estimator of 6 under the usual square loss. 

Theorem 3.2. Under condition (3.1), if rn = t'^dHn{t) is bounded in proba- 
bility, then 6n Oq, in probability. 

Proof. Using Cauchy Schwartz inequality, we have, for any < e < 1/2, 




tdHn{t) + / tdHnit)\ < 2r„ei/2. 



Thus, 

(1 - e)H-\e) - 2r„eV2 < 0^ < (l _ e)H-\l - e) + 2r„eV2. 
Now, with M„ = if- 1(1/2), 

\On-0o\ < |M„-^o| + |^n-M„| < \Mn-eo\-h\H-\l-e)-H-\e)\+2rne^/^-h2e\Mn\ 
Since e > is arbitrary, the result follows using Theorem 3.1. □ 

Denote 6'„ = argmaxg hn{9), the value that maximizes the CD (or aCD) density 
function hn{e) = jsHn{e). Let e„ = info<,<i/2{e : L ^ [H-\e) , H-\l - e)]} . The 

event e„ > e* is that On will not be in the tails having probability less than e* . We 
have the following theorem. 

Theorem 3.3. Assume condition (3.1) holds. Suppose there exists a fixed e* > 0, 
such that P{€n > e*) — > 1. Then, On — * in probability. 

Proof Note that 9^ e [H-'^{e*), H-'^{1 - e*)Y implies e„ < e*. The claim follows 
immediately using (3.1) and Theorem 3.1. □ 
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3.3. Hypothesis testing 

Now, let us turn to one-sample hypothesis testing, given that a CD Hn{-) is available 
on a parameter of interest 6. Suppose the null hypothesis is Kq : 9 G C versus the 
alternative Ki : 9 € . A natural line of thinking would be to measure the support 
that Hn{-) lends to C. If the support is "high," the verdict based on iJ„ should 
be for C and if it is low, it should be for C. The following two definitions of 
support for Kq from a CD are suggested by classical p-values. These two notions of 
support highlight the distinction between the two kind of p-values used in statistical 
practice, one for the one-sided hypotheses and the other for the point hypotheses. 

I. Strong-support ps{C), defined as Ps{C') — Hn{C), which the probability content 
of C under Hn ■ 

II. Weak-support p^(C), defined as Pw{G) — sup^g,^ 2 min(i?„(0), 1 — Hn{9)). 
See, e.g.. Cox and Hinkley 6J, Barndorff-Niclsen and Cox and especially Fraser 



llj . for discussions on this topic related to p- value functions. Our results are closely 
related to those in Fraser [llj but they are developed under a more general setting. 
We use the following claim for making connection between the concepts of support 
and the p-values. 

Claim A If Kq is of the type (— oo,0o] or [9q,oo), the classical p- value typically 
agrees with the strong-support Ps{C). If Kq is a singleton, i.e. Kq is 9 = 9q, 
then the p- value typically agrees with the weak-support Pw{C'). 

To illustrate the above claim, consider tests based on the normalized statistic 
T„ = (6 — 9)/SE{9), for an arbitrary estimator 9. Based on the method given in 
Section 1.1, a CD for 9 is Hn{x) = P,^^ {9 — SE{9)rjn < x), where rjn is independent 
of the data X„ and rin=Tn. Thus, 

Ps{~^, 9q) = HM - Pri. (Vn > . 

This agrees with the p-values for one-sided test Kq : 9 < 9q versus Ki : 9 > 9q. 
Similar demonstrations can be given for the tests based on studentized statistics. 
If the null hypothesis is Kq : 9 — 9q vs Ki : 9 ^ 9q, the standard p-values, based 
on T„, is twice the tail probability beyond {9 — 9q)/SE{9) under the distribution 
of T„. This equals 

2min [Hn{9Q), 1 - if„((?o)] = Pw{9q). 

Remark 3.1. It should be remarked here that Pw{9q) is the Tukey's depth (see 
Tukey [27]) of the point 9o w.r.t. ff„(-). 

The following inequality justifies the names of the two supports. 

Theorem 3.4. For any set C, Ps{C) < Pw{C). 

Proof. Suppose the sup in the definition of Pu,(C) is attained at 9' , which may or 
may not be in C and 9' < M„ (recall M„ = the median of Hn). Let 9" be the point 
to the right of M„ such that 1 - Hn{9") = 7J„(6''). Then C C {-oo, 9'] \J[9" , oo); 
ignoring a possible null set under i?„. As a consequence 

Ps{C) = Hn{C) < Hn{{-^, 9']) + Hn{[9", - p„(C). 

Similar arguments are given when 9' > Mn. □ 
The next three theorems justify Claim A. 
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Theorem 3.5. Let C be of the type (—00, ^o] or [Oq, 00). Then sup^g^ 
aj = a. 

Proof. Let C = (-00,6*0]. For a 6* < Oq, Fe(-ff„((-oo, Sq]) < a) < Fe(i?„((-oo, 
0]) < a) = a. When 9 ~ Oq, one has the equahty. A similar proof is given when 
C=[6lo,oo). □ 

A limiting result of the same nature holds for a more general null hypothesis, 
namely a union of finitely many disjoint closed intervals (bounded or unbounded). 
Assume the following regularity condition: as n — > cxd, 

(3.2) sup Pg{max{H^{0 - e), 1 - i7„(6l + e)} > S) ^ 

ee[a,&] 

for any finite a, b and positive e, 5. 

Essentially, the condition (3.2) assumes that the scale of Hn{-) shrinks to 0, 
uniformly in 9 lying in a compact set. 

Theorem 3.6. Let C = U^=i where Ij are disjoint intervals of the type (—00, a] 
or [c,d] or [6,00). If the regularity condition (3.2) holds, then svcpg^(j Pe{ps{C) < 
a) ^ a, asn ^ 00. 

Proof. It suffices to prove the claim with the sup over 9 G Ij for each j ~ 1, . . . ,k. 
Consider first the case when Ij = (—00, a]. For this Ij and any ^ > 0, 

sup Pe{Ps{C) <t)> Pe^a{Ps{C) <t)> Pe=a{Ps{Ij) <t- 5) + o{\) ^t- 6 + o{l). 
eeij 

The second inequality is due to (3.2). Also, from Theorem 3.5, 
sup PeipsiC) <t)< sup Pe{Ps{Ij) <t)=t 

which completes the proof for this Ij. The case of Ij = [b, oo) is handled similarly. 
Turning to the case Ij — [c,d], c < d, we write it as the union of Iji = [c, ^-^] and 
Ij2 = [^-^, d\ , and note that, for any (5 > 0, it follows from (3.1) that 

sup Pe{ps{C) <t)> Pg=c{ps{C) <t)> Pe=c{ps{{c,(x]) 
eeiji 

<t-S) +o(l) =t-<5 + o(l) 

Furthermore, from Theorem 3.5, we have for any 6 > 0, 

sup Pe{psiC) <t) < sup Pe{ps{Ij) <t) < sup Pe(ps([c, oo) < i + <5) + o(l) 
ee/ji eeiji ee/^-i 

= t + (5 + o(l) 

The case of sup over 9 G Ij2 is dealt with in a similar way. In the arguments 9 = c 
is replaced hy 9 = d. □ 

Remark 3.2. The result of Theorem 2.6 still holds if Ps{C) is replaced by p* = 
maxKjXfc ps(/j). The use of p* for p- value amounts to the so called Intersection 
Union Test (see, Berger [S])- Since p* as a p- value gives a larger rejection region 
than that by Ps(Ui ^i) ^ ^ p- value, testing by p* will have better power, for the 
same asymptotic size. If the intervals Ij are unbounded, it follows that 

sup Pe(p,(C) <t)< supPeipl <t) <t. 
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Moving on to the situation when Ko is 6 — 6q and Ki is 9 ^ Oq, it is immediate 
that Pw{9q) — 2 min{iJ„(0o), 1 ^ ^^ri(^'o)} has the U[Q, 1] distribution, since i?„(6'o) 
does so. Thus Pw{9q) can be used hke a p- value in the case of such a testing problem. 
In a more general situation when Kq is C — {9i, 62, ■ ■ ■ ,0k} and Ki is C, one has 
the following asymptotic result. 

Theorem 3.7. Let Kq be C = {^i, . . . , 9k} and Ki be C. Assume that Hn{Oi — 
e) and 1 — Hn{di + e)— ^0 under 6 =^ 6i, for all i = 1, 2, . . . , fc. Then maxg^c 
Pe{pw{C) < ct) ^ a, as n ^ 00. 

Proof. For simplicity, let C = {0i,O2}, 9i < 62- Under the condition, clearly 
Pwi02) < 2{1 - 7J„(02)}^O, if 6* = 01. Since p„(6»i) has the C/[0, 1] distribution 

under 9 — 9i, it follows, using standard arguments, that, max {p^(0i),pu,(02)}-^ 
U[Q, 1] when 9 = 9i. The same holds under 9 = 6*2. The result thus follows. □ 

Example 3.1. [Bio- equivalence). An important example of the case where C, the 
null space, is a union of closed intervals is provided by the standard bioequivalence 
problem. In this testing problem Kq is the region ^ € (—00, .8] lJ[1.25,oo), where 
/^i , ^2 are the population means of bioavailability measures of two drugs being 
tested for equivalence. 

Example 3.2. In the standard classification problem, the parameter space is di- 
vided into fc-regions. The task is to decide which one contains the true value of 
parameter 9. A natural (but probably over-simplified) suggestion is to compare CD 
contents of the /c-regions and attach 9 to the one which has got the maximum CD 
probability. 

4. Profile likelihood functions and CDs 

We examine here the connection between the concepts of profile likelihood function 
and asymptotic CD. Let xi, 2:2, . . . , a;„ be independent sample draws from a para- 
metric distribution with density ffj{x), if is a, p x 1 vector of unknown parameters. 
Suppose we are interested in a scalar parameter 9 — s(j|), where s(-) is a second- 
order differentiable mapping from to C M. To make an inference about 9, 
one often obtains the log-profile likelihood function 

n n 

£ni9) ^y^Jogff,(g){xi), where fi{9)= argmax V'log/«(a;j). 

Denote (0) = £n{9) — (n{9), where 9 = argmaxg£n{9) is the maximum likelihood 
estimator of the unknown parameter 9. We prove below that e^"*-^^, after normal- 
ization (with respect to 9, so that the area under its curve is one), is the density 
function of an aCD for 9. The technique used to prove the main result is similar to 
that used in the proofs of Theorems 2 and 3 in Efron Q . 

Let = -^l^iO) and ^(^) = lim„^_|_oo '^^ni^)- Assume that the true value 
6* = 6*0 is in 8°, the interior of O, and iQ^{9()) > 0. The key assumption is that 

V^i9-9o)/V^^^ NiO,l). 

In addition to the regularity conditions that ensure the asymptotic normality of 9, 
we make the following three mild assumptions. They are satisfied in the cases of 
commonly used distributions. 
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(i) There exists a function k{9), such that (.^{9) < —nk{9) for all large n, a.s., 
and /q e^'^'^^^^d9 < +00, for a constant c > 0. 

(ii) There exists an e > 0, such that 

Ce= inf i7:\9)>0 and = inf fc(6l) > 0. 

\e-eo\<e " \e-eo\>e 

(iii) i'n{9) satisfies a Lipschitze condition of order 1 around 9o- 
For y G 0, write 

Hn{y) = f e-^d9 and G„(y) = — / e^-^^^rf^, 

yzTTZn/n J(-(x),j/]ne J{-oo,y]ne 

where c„ = eC(^)d6'. We assume that c„ < 00 for all n and X„; condition (i) 
implies c„ < 00 for n large. 

We prove the following theorem. 

Theorem 4.1. G„(6') = + Op(l) /or eac/i 9 e Q. 

Proof. We prove the case with 6 = (—00, +00); the other cases can be proved 
similarly. Let e > be as in condition (ii). We first prove, for any fixed s > 0, 

(4.1, £-.C(..., = o,(-L) and £" eCC,^ = o,(-l). 

Note, by condition (i) and (ii), when n is large enough, we have ^*(^) + slogn < 
-nk{9) + nkj2 = -n{k{9) - kj2) < -nkj2, for 16* - 6*01 > e. By Fatou's Lemma, 

limsup /" " e^nW+'^°^''d9< r limsupp/"W+*'°enrf5» 

n— ^+00 J -Qo J -00 ri^+00 

lim e-"'='/2d6' = 0. 

This proves the first equation in (4.1). The same is true for the second equation in 

(4.1) . 

In the case when |^ — < e, by Taylor expansion 

(4.2) t^{e) = ^Ci^){e - 9f, for 9 between 9 and 9. 

From condition (ii) we have 1^(9) < —^Ce{9 — 9)^, when n is large. Thus, one can 
prove 

(4.3) / e^-^^U9 = Op(l), and / e^n^^U9 = Op(l). 

Now, consider the case when \9 — 9o\ < ^logn. By (4.2) and condition (iii), 
one has 

(4.4) r e^"(^)(i0 = r e-^^^^d9 + 0p{l). 

JOo—^^logn J0o — -^logn 



146 



K. Singh, M. Xie and W. E. Strawderman 



From (4.1), (4.3) and (4.4), it easily follows that 

(4.5) -7^= f e^"Wd0=-=L= f e-^d9 + o,{l) 

OO OO 

for all 6 € (— oo,cx)). Note that (4.5) implies that c„ = ^/2^Ti^Jn + Op{^) . So (4.5) 
is, in fact, G„(0) = Hn{9) + Op{l) for all 6 E (— oo, oo). This proves the theorem. □ 

Remark 4.1. At 6 = Oq, HJOq) = $(-^2=) + Op{l) ^ U{0, 1). It follows from 
this theorem that G„(6'o) — > U{0, 1), thus G„ is an aCD. 

Remark 4.2. It is well known that at the true value ^o, ^2£* ((?o) = — 2{4i(^o) — 
in{0)} is asymptotically equivalent to n{9o — 9)"^ /in, see, e.g.. Murphy and van der 
Vaart |2fl]. In the proof of the above theorem, we need to extend this result to a 
shrinking neighborhood of 9o, and control the "tails" in our normalization of e^"*^^-* 
(with respect to 9, so that the area underneath the curve is 1). This normalization 
produces a proper distribution function, and Theorem 4.1 is an asymptotic result 
for this distribution function. 

As a special case, the likelihood function in the family of one-parameter distri- 
butions (i.e., 11 is a scalar parameter) is proportional to an aCD density function. 
There is also a connection between the concepts of aCD and other types of likeli- 
hood functions, such as Efron's implied likelihood function, Schweder and Hjort's 
reduced likelihood function, etc. In fact, one can easily conclude from Theorems 
1 and 2 of Efron [3] that in an exponential family, both the profile likelihood and 
the implied likelihood (Efron Q) are aCD densities, after a normalization (with 
respect to 9). Schweder and Hjort [l^l proposed the reduced likelihood function, 
which itself is proportional to a CD density for a specially transformed parameter. 
See Welch and Peers [i^l and Fisher [l^ for earlier accounts of likelihood function 
based CDs in the case of single parameter families. 

5. Multiparameter joint CD 

Let us first note that in higher dimensions, the cdf is not as useful a notion, at least 
for our purposes here. The main reasons are: (a) The region F{x) < a is not of 

much interest in 1R'=. (b) The property F{X):^U[0, 1], when X=F is lost! 

5.1. Development of multiparameter CD through Cramer- Wold device 

The following definition of a multiparameter CD has the make of a random vector 
having a particular multivariate distribution (arguing via characteristic functions) . 

Definition 5.1. A distribution Hn{-) = iJ„(X„, •) on H'^ is a CD in the linear 
sense (Z-CD) for a fc x 1 parameter vector if and only if for any fc x 1 vector A, the 
conditional distribution of A' ^„ given X„ is a CD for A' 6 where the fc x 1 random 
vector ^„ has the distribution Hn{-) given X„. 

Using the definition of asymptotic CD on the real line, one has a natural extension 
of the above definition to the asymptotic version. With this definition, for example. 
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the raw bootstrap distribution in M*^ remains an asymptotic CD, under asymptotic 
symmetry. 

An /-CD Hi n is more precise than iJ2,n (both for the same in R*^), if the CD 
for a' given by Hi n is more precise than that given by -ff2,n, for all fc x 1 vectors 
A. For the normal mean vector fi, with known dispersion E, l — ^(^y/ri'S~i (X„ — /i)) 
is the most precise CD for fi. 

Let An{0 — 0) have a completely known absolutely continuous distribution G(-), 
where An is non-singular, non-random matrix. Then Hn{-) defined by Hn{0) — 
1 — G[An{0 — 0)) is an l-CD for 0. If G is the limiting distribution then the above 
Hn is an asymptotic CD, in which case An can be data-dependent. A useful property 
of this extension to H'^ is the fact that the CD content of a region behaves like p- 
values (in limit), as it does in the real line case. See, Theorem 4.2 of Liu and Singh 
[ist for the special case of bootstrap distribution in this context. 

It is evident from the definition that one can obtain a CD for any linear function 
A0 from that of 0, by obtaining the distribution of j4^„. The definition also entails 
that for a vector of linear functions, the derived joint distribution (from that of 
is an l-CD. For a non-linear function though, one can in general get an asymptotic 
CD only. Let i?,i(-) be an asymptotic l-CD for 0. Suppose random vector ^„ follows 
the distribution Hn ■ Consider a possibly non- hnear function g{0) : IR'= ^ ^ < k. 
Let each coordinate of g{0) have continuous partial derivatives in a neighborhood 
of 00. Furthermore, suppose the vector of partial derivative at = 0o is non-zero, 
for each coordinate of g{0). 

Theorem 5.1. Under the setup assumed above, the distribution of g{^n) is an 

asymptotic l-CD of g(0), at — 0q, provided the Hn probability of {\\0 — 0o\\ > e} 
p 

-^0, for any e > 0. 

Proof. The results follows from the following Taylor's expansion over the set \\0 — 
< e- gi^n) = gi^n) + A(^„)(^„ - fo), where A(^„) is the matrix of partial 
derivative of g{-) at lying within the e- neighbor hood of 0o- CH 

Remark 5.1. Given a joint l-CD for 0, the proposition prescribes an asymptotic 
method for finding a joint l-CD for a vector of functions of 0, or for just a single 
function of 0. This method can inherit any skewness (if it exists) in the l-CD of 
g{0). This will be missed if direct asymptotics is done on g{0) — g{0o). 

On the topic of combining joint /-CDs, one natural approach is by using the 
univariate CDs of linear combination, where combination is carried out by the 
methods discussed in Singh, Xie and Strawderman [26| . The problem of finding a 
combined joint l-CD comes down to finding a joint distribution agreeing with the 
univariate distributions of linear combinations, if it exists. The existence problem 
(when it is not obvious) could perhaps be tackled via characteristic functions and 
Bochner's Theorem. It may also be noted that an asymptotic combined l-CD of a 
nonlinear function of the parameters can be constructed via Theorem 5.1 and the 
methods of Singh, Xie and Strawderman (26j . 

5.2. Confidence distribution and data depth 

Another requirement on a probability distribution Hn on ]R'° (based on a data set 
X„) to be a confidence distribution for (a fc-column vector) should naturally be: 
the 100i% "central region" of Hn are confidence regions for 0, closed and bounded, 
having the coverage level 100t%. We define such a Hn to be a c-CD, where c stands 



148 



K. Singh, M. Xie and W. E. Strawderman 



for circular or central. We note in Remark 5.2 that the notions of ^-CD and c-CD 
match in a special setting. 

Definition 5.2. A function Hn{-) — i/„(-,X„) on 8 G H'^ is called a Confidence 
Distribution in the circular sense (c-CD) for fc x 1 multiparameter 0, if (i) it is a 
probability distribution function on the parameter space for each fixed sample 
set X„, and (ii) the lQOt% "central region" of Hn{ ) is a confidence region for 0, 
having the coverage level 100t% for each t. 

By central regions of a distribution G, statisticians usually mean the elliptical 
regions of the type y' X^g^ U o,- This notion of central regions, turns out to be 
a special case of central regions derived from the notion of data-depth. See Liu, 
Parelius and Singh [l6| among others, for various concepts of data-depth and more. 
The elliptical regions arise out of so-called Mahalanobis-depth (or distance). The 
phrase data-depth was coined by J. Tukey. See Tukey [13], for the original notion of 
Tukey's depth or half-space depth. In recent years, the first author, together with R. 
Liu, has been involved in developing data-depth, especially its application. For the 
reader's convenience, we provide here the definition of Tukey's depth. TDg(x) = 
miw Pg{H), where the minimum is over all half-spaces H containing x. On the 
real line, TDcipi) = min (^G{x), 1 — G{x)). A notion of data-depth is called affine- 
invariant if -D(X, x) = D^AX. + b, Ax -I- b) where Dpi., x) is the depth of the point 
X w.r.t. the distribution of a random vector X. Here AX. -|- b is a linear transform 
of X. The above mentioned depths are affine- invariant. For an elliptical population, 
the depth contours agree with the density contours (see Liu and Singh [iTj]). 

For a given depth I? on a distribution G on R'^, let us define the centrality 
function 

C(x) = C(G,D,x) = Paly : Daiy) < Dg{^)}. 

Thus, the requirement (ii) stated earlier on iJ„ in Definition 5.2, can be restated as: 
{x : C{Hn, D, x) > a} is a 100(1 — a)% confidence region for 0, for all < a < 1. 
This is equivalent to 

Requirement (ii)': C{0a) = C{Hn, D,0q), as a function of sample set X„, has 
the U[0, 1] distribution. 

For a c-CD iJ„, let us call the function C(-) = C(iJ„, D, •) a confidence centrality 
function (CCF). Here D stands for the associated depth. 

Going back to the real line, if iJ„ is a CD, one important CCF associated with 
Hn is 

C„(x) = 2min{i?„(x), 1 - i?„(.T)}. 

This CCF gives rise to the two-sided equal tail confidence intervals. The depth 
involved is the Tukey's depth on the real line. 

Next, we present a general class of sample dependent multivariate distributions 
which meet requirement (ii)'. Let An{0 — 0) have a cdf G„(-) independent of pa- 
rameters. The nonsingular matrix An could involve the data X„. Typically An is 
a square root of the inverse dispersion matrix of 9 (or an estimated dispersion ma- 
trix). Let rjn be an (external) random vector, independent of X„, which has the cdf 
G„. Let Hn be the cdf of — An^rjn for a given vector X„. 

Theorem 5.2. Let D be an affine-invariant data- depth such that the boundary sets 
{x : C{Gn, D, x) — t} have zero probability under G„ (recall the centrality function 
C{-)). Then Hn{-) defined above meets Requirement (ii)' and it is a c-CD. 
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Proof. For any t G (0, 1), in view of the afRne-invariance of D, the set 

{X„ : C{Hn,D, 0o) <t} = {X„ : C(G„, D, A„(0 - 9„)) < t} 
— {Xn corresponding to An{9 — Oq) lying in the outermost 100t% 
of the population G„ } . 

Its probability content = t, since An{0 — 9) is distributed as Gn- □ 

Remark 5.2. The discussion preceeding Theorem 5.1 and the result in Theorem 
5.2 imply that this _ff„ is both a l-CD and a c-CD of 6 when An is independent of the 
data. The /-CD and c-CD coincide in this special case! When An is data-dependent, 
one has an exact c-CD, but only an asymptotic I- CD. 

Given two joint c-CDs and i?2n, based on the same data set, their precision 
could be compared using stochastic comparison between their CCFs involving the 
same data-depth. More precisely, let Ci,C2 be the CCFs of two joint CDs Hin 
and H2n, induced by the same notion of data-depth, i.e., Ci(x) = C(-ffm,Z3,x) = 
{fraction of the H^n population having Z?-depth less than or equal to that of x}. One 

sto 

would define: i?i„ is more precise than H2n when 9 — 9q prevails, if Ci(x) < C2(x), 
under = 00, for all 00- 
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